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Abstract. Data mining is an important and challenging 
problem for the efficient analysis of large astronomical 
databases and will become even more important with the 
development of the Global Virtual Observatory. In this 
study, learning vector quantization (LVQ), single-layer 
perceptron (SLP) and support vector machines (SVM) 
were put forward for multi-wavelength data classification. 
A feature selection technique was used to evaluate the sig- 
nificance of the considered features to the results of clas- 
sification. From the results, we conclude that in the situ- 
ation of less features, LVQ and SLP show better perfor- 
mance. In contrast, SVM shows better performance when 
considering more features. The focus of the automatic clas- 
sification is on the development of efficient feature-based 
classifier. The classifiers trained by these methods can be 
used for preselecting AGN candidates. 

Key words: Methods: data analysis. Methods: statistical. 
Astronomical data bases-catalogues 



1. Introduction 

Today, there are many impressive archives painstakingly 
constructed from observations associated with an instru- 
ment. The Hubble Space Telescope (HST), the Chandra 
X-Ray Observatory, the Sloan Digital Sky Survey (SDSS), 
the Two Micron All Sky Survey (2MASS), and the Digi- 
tized Palomar Observatory Sky Survey (DPOSS) are ex- 
amples of this. Furthermore yearly advances in electronics 
bring new instruments, doubling the amount of data we 
collect each year. For example, approximately a gigapix- 
els is deployed on all telescopes today, and new gigapixel 
instruments are under construction. This trend is bound 
to continue. Just like what Szalay says, the astronomy is 
facing "data avalanche" (See e.g., Szalay & Gray 2001). 
How to organize, use, and make sense of the enormous 
amounts of data generated by today's instruments and ex- 
periments? It is very time consuming and demands high 
quality human resources. Therefore, better features and 
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better classifiers are required. In addition, expert systems 
are also useful to get quantitative information. 

It is possible to solve the above questions with Neural 
Networks (NNs), because they permit application of ex- 
pert knowledge and experience through network training. 
Furthermore, astronomical object classification based on 
neural networks requires no priori assumptions or knowl- 
edge of the data to be classified as some conventional 
methods need. Neural networks , over the years, have 
proven to be a powerful tool capable to extract reliable 
information and patterns from large amounts of data even 
in the absence of models describing the data (cf. Bishop 
1995) and are finding a wide range of applications also in 
the astronomical community: catalogue extraction (An- 
dreon et al. 2000), star/galaxy classification (Odewahn et 
al. 1992; Naim et al. 1995; Miller & Coe 1996; Mahonen & 
Hakala 1995; Berlin & Arnout 1996; Bazell & Peng 1998), 
galaxy morphology (Storrie-Lombardi et al. 1992; Lahav 
et al. 1996), classification of stellar spectra (Bailer- Jones 
et al. 1998; AUende Prieto et al. 2000; Weaver 2000). Just 
to name a few, the rising importance of artificial neural 
networks is confirmed in this kind of task. There is also 
a very important and promising recent contribution by 
Andreon et al. (2000) covering a large number of neural 
algorithms. 

In this work, a class of supervised neural networks 
called learning vector quantization (LVQ) was proposed. 
LVQ shares the same network architecture as the Koho- 
nen self-organizing map (SOM), although it uses a super- 
vised learning algorithm. Bazell & Peng (1998) pioneered 
the use of it in astronomical applications. Another class 
of supervised neural networks named multi-layer percep- 
trons (MLP) was presented. Goderya & McGuire (2000) 
summarized progress made in the development of auto- 
mated galaxy classifiers using neural networks including 
MLP. Qu et al. (2003) experimented and compared multi- 
layer perceptrons (MLP), radial basis function (RBF), and 
support vector machines (SVM) classifiers for solar-flare 
detection. Meanwhile, an automated algorithm called sup- 
port vector machines (SVM) for classification was intro- 
duced. The approach was originally developed by Vapnik 
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(1995). Wozniak ct al. (2001) and Humphreys et al. (2001) 
have pioneered the use of SVM in astronomy. Wozniak et 
al. (2001) evaluated SVM, K-means and Autoclass for au- 
tomated classification of variable stars and compared their 
effectiveness. Their results suggested a very high efficiency 
of SVM in isolating a few best defined classes against the 
rest of the sample, and good accuracy for all classes con- 
sidered simultaneously. Humphreys et al. (2001) used dif- 
ferent classification algorithms including decision trees, K- 
nearest neighbor and support vector machines for classi- 
fying the morphological type of the galaxy. Furthermore, 
they got the very promising results of their first experi- 
ments with different algorithms. 

Celestial objects radiate energy over an extremely wide 
range of wavelengths from radio waves to infrared, optical 
to ultraviolet. X-ray and even gamma rays. Each of these 
observations carries important information about the na- 
ture of objects. Different physical processes show different 
properties in different bands. Based on these, we apply 
learning vector quantization (LVQ), single-layer percep- 
tron (SLP) and support vector machines (SVM) to classify 
AGNs, stars and normal galaxies with data from optical. 
X-ray, infrared bands. In this paper we present the prin- 
ciples of LVQ, SLP and SVM in section 2. In section 3, 
we discuss the sample selection and analysis the distri- 
bution of parameters. In section 4 the computed results 
and discussion are given. Finally, in section 5 we conclude 
this paper with a discussion of general technique and its 
applicability. 

2. The Methods Used 

2.1. Learning Vector Quantization 

Here the adopted learning vector quantization (LVQ) 
algorithm is based upon the LVQ_PAK routines de- 
veloped at the Laboratory of Computer and Informa- 
tion Sciences, Helsinki University of Technology, Finland. 
Their software can be obtained via the WWW from 
www.cis.hut.fi/research/lvq_pak/. If interested in the ap- 
plication of LVQ in astronomy, we can refer to the papers 
of Bazell & Peng (1998) and Cortiglioni et al. (2001). 

The LVQ method was developed by Kohonen (1989) 
who also developed the popular unsupervised classification 
technique known as the self-organizing map or topological 
map neural networks (Kohonen 1989, 1990). SOM per- 
forms a mapping from an n-dimensional input vector onto 
two-dimensional array of nodes that is usually displayed 
in a rectangular or hexagonal lattice. The mapping is per- 
formed in such a way as to preserve the topology of the 
input data. This means that input vectors that are similar 
to each other in some sense, are mapped to neighboring 
regions of the two-dimensional output lattice. Each node 
in the output lattice has an n-dimensional reference vector 
of weights associated with it, one weight for each element 
of the input vector. The SOM functions compare the dis- 
tance, in some suitable form, between each input vector 



and each reference vector in an iterative manner. With 
each iteration, the reference vectors are moved around in 
the output space until their positions converge to a sta- 
ble state. When the reference vector that is closest to a 
given input vector is found (the winning reference vector), 
that reference vector is updated to more closely match the 
input vector. This is the learning step. 

LVQ uses that same internal architecture as SOM: a 
set of n-dimensional input vectors are mapped onto a two- 
dimensional lattice, and each node on the lattice has an n- 
dimensional reference vector associated with it. The learn- 
ing algorithm for LVQ, i.e., the method of updating the 
reference vectors, is different from that of SOM. Because 
LVQ is a supervised method, during the learning phase the 
input data are tagged with their correct class. We define 
the input vector x as: 

X = {xi,X2,X3,- ■ ■ ,Xn) (1) 

Reference vector for ith output neuron Wi as: 

= (wii, a;2i, W3,, • • • ,a;„i) (2) 

Define Euclidean distance between the input vector and 
the reference vector of the neuron i as: 



(3) 



When D{i) is a minimum, the input vectors are compared 
to the reference vectors and the closest match is found 
using the formula 



Wi. — x\<\ UJi 



(4) 



where x is an input vector, uJi are the reference vectors, 
and oJi- is the winning reference vector. The reference vec- 
tors arc then updated using the following rules: 
If X is in the same class as w^. , 

AtJi. = a{t){x - Ui.) (5) 
If a; is in a different class from Wj. , 

/S.uji* = ~a{t){x — Wj.) (6) 
If i is not the index of the winning reference vector, 

Awi. = (7) 

The learning rate < Q;(t) < 1 should generally be 
made to decrease monotonically with time, yielding larger 
changes for early iterations and more fine tuning as con- 
vergence is approached. The time t is taken as positive 
integers. Here we adopt the optimized-leaning-rate a{t) 
(see Kohonen et al. 1995) 

a{t - 1) 



a{t) 



1 + s{t)a{t - 1) 



(8) 



where s{t) = +1 if the classification is correct and s{t) = 
— 1 if the classification is wrong. In this work, the initial 
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value of a{t) is selected, 0.3, whereby learning is signif- 
icantly speeded up, especially in the beginning, and the 
u)i» quickly find their approximate asymptotic values. Two 
himdrcd codebook vectors in the codebook is adopted, 
meanwhile, 7 neighbors is used in knn-classification. The 
network is trained for 5000 epochs. There are several ver- 
sions of the LVQ algorithm for which the learning rules 
differ in some details. See Kohonen (1995) for an explana- 
tion of the differences between these algorithms. When the 
learning phase is over, the reference vectors can be frozen, 
and any further inputs to the system will be placed into 
one of the existing classes, but the classes will not change. 

2.2. Support Vector Machines 

Support Vector Machines (SVM) are learning machines 
that can perform binary classification (pattern recogni- 
tion) and real valued function approximation (regression 
estimation) tasks. SVM creates functions from a set of 
labeled training data and operate by finding a hypersur- 
face in the space of possible inputs. This hypersurface will 
attempt to split the positive examples from the negative 
examples. The split will be chosen to have the largest dis- 
tance from the hypersurface to the nearest of the positive 
and negative examples. Intuitively, this makes the clas- 
sification correct for testing data that is near, but not 
identical to the training data. In detail, during the train- 
ing phase SVM takes a data matrix as input, and labels 
each sample as either belonging to a given class (positive) 
or not (negative). SVM treats each sample in the matrix 
as a point in a high-dimensional feature space, where the 
number of attributes determines the dimensionality of the 
space. SVM learning algorithm then identifies a hyper- 
plane in this space that best separates the positive and 
negative training samples. The trained SVM can then be 
used to make predictions about a test sample's member- 
ship in the class. In brief, SVM non-linearly maps their 
n-dimensional input space into a high dimensional feature 
space. In this high dimesional feature space a linear clas- 
sifier is constructed. More information can be found in 
Burges' tutorial (1998) or in Vapnik's book (1995). 
Given some training data 

{xi,yi),...,{xi,yi), g(-1,1) 

If the data is linearly separable, one can separate it 
by an infinite number of linear hyperplanes. We can write 
these hyperplanes as 

f{x,a) = {oJa-x)+b (9) 

Among these hyperplanes, the one with the maximum 
margin is called by the optimal separating hyperplanc. 
This hyperplane is uniquely determined by the support 
vectors on the margin. It satisfies the conditions 

yi[{w-xi) + b]>l, i = l,...,l. (10) 
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Besides satisfying the above conditions, the optimal 
hyperplane has the minimal norm 

IklP = (^-t^) (11) 

The optimal hyperplane can be found by finding the 
saddle point of the Lagrange functional: 

1 ' 

L{uj, 6, a) = -w • w - ^ a,{[{(j ■ Xi) + b]yi - 1) (12) 

2=1 

where Ui are Lagrange multipliers. The Lagrangian has 
to be minimized with respect to ti;,6 and maximized with 
respect to a, > 0. 

The saddle point is defined as follows: 

I 

w = ^ aiXiyi (13) 

i=l 

where a is the maximum point of 

I ^ I 

W{a) = 53 - 2 H aiajy,yj{xi ■ Xj) (14) 

subject to constraints 

I 

^aiyi = 0, a, > (15) 

i=l 

Therefore the optimal separating hyperplane has the 
form 

f{x) = sign{ ^ yiai{xi ■ x) - b) (16) 

support vectors 

This solution only holds for linearly separable data, 
but has to be slightly modified for linearly non-separable 
data, the aj has to be bounded: 

0<ai<C (17) 

where C is a constant chosen a priori. 

To generalize to non-linear classification, we replace 
the dot product with a kernel [k{-)]. For binary classifi- 
cation, Stitson et al.(1996) and Gunn (1998) stated it in 
detail. As for the multi-class classification can refer to We- 
ston and Watkins (1998). 

2.3. Single-layer Perceptron 

Multi-layer perceptrons (MLP) are feedforward neural 

networks trained with the standard backpropagation algo- 
rithm. If no hidden layer, MLP are also called single-layer 
perceptron. They are supervised networks so they require 
a desired response to be trained. They learn how to trans- 
form input data into a desired response, so they are widely 
used for pattern classification. With one or two hidden lay- 
ers, they can approximate virtually any input-output map. 
They have been shown to approximate the performance 
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of optimal statistical classifiers in difficult problems. Most 
neural network applications involve MLP. 

The basic MLP building unit is a model of artifi- 
cial neuron. This unit computes the weighted sum of the 
inputs plus the threshold weight and passes this sum 
through the activation function (usually sigmoid) (18), 
(19): 



(18) 



(19) 



where Vj is the linear combination of inputs Xi, a;2, . . . , 
of neuron j, ujjQ = 6j is the threshold weight connected 
to a special input xo = —1, yj is the output of neuron j, 
and <fj{-) is its activation function. Herein we use a special 
form of sigmoidal (non-constant, boimded, and monotone- 
increasing) activation function - logistic function 



Vj 



1 



1 + exp{—Vj) 



(20) 



In a multilayer perceptron, the outputs of the units in one 
layer form the inputs to the next layer. The weights of the 
network are usually computed by training the network 
using the back-propagation (BP) algorithm. 

A multilayer perceptron represents a nested sigmoidal 
scheme (18), its form for a single output neuron is 

F{x,ui) = ip(^u;oM^u;jkip{---ip(^uJiixi) ■■■))) (21) 



where is the sigmoidal activation function, cooj is the 
synaptic weight from neuron j in the last hidden layer 
to the single output neuron o, and so on for the other 
synaptic weights, Xi is the i-th element of the input vector 
X. The weight vector w denotes the entire set of synaptic 
weights ordered by layer, then the neurons in a layer, and 
then their number in a neuron. 

3. Chosen Scimple and Parameters 

Usually, astronomical object classification is based on the 
properties of spectra, photometry, multiwavelength and 
so on. In order to check the effectiveness and the effi- 
ciency of our provided methods, we classified objects with 
data from X-ray (ROSAT), optical (USNO-A2.0) and in- 
fared (2MASS) bands. By positional cross-correlation of 
ROSAT, USNO-A2.0 and 2MASS released databases, we 
obtain the multi-wavelength data. The three catalogs are 
described in detail as follows: 

The ROSAT All-Sky (RASS) using an imaging X-ray 
Telescope (Triimper 1983), are well suited for investigating 
the X-ray properties of astronomical objects. The RASS 
Bright Source Catalogue (RBSC) includes 18,811 sources, 
with a hmiting ROSAT PSPC coimtrate of 0.05 coimts 
s~^ in the 0.1-2.4 keV energy band. The typical positional 



accuracy is 30". Similarly, the RASS Faint Source Cata- 
logue (RFSC) contains 105,924 sources and represents the 
faint extension to RBSC. The RBSC and RFSC catalogues 
contain the ROSAT name, positions in equatorial coordi- 
nates, the positional error, the source countrate (Ci?) and 
error, the background countrate, exposure time, hardness- 
ratios HRl and HR2 and errors, extent (ext) and likeli- 
hood of extent (extl), and likelihood of detection. The 
two hardness ratios HRl and HR2 represent X-ray col- 
ors. From the count rate A in the 0.1-0.4 keV energy band 
and the count rate B in the 0.5-2.0 keV energy band, 
HRl is given by: HRl = {B - A)/{B + A). HR2 is de- 
termined from the count rate C in the 0.5-0.9 keV energy 
band and the count rate D in the 0.9-2.0 keV energy band 
by: HR2 = {D - C)/{D + C). CR is ROSAT total count 
rate in counts s""'^. The parameters of ext and extl are 
source extent in arcsecond and likelihood of source extent 
in arcsecond, respectively. The amount of ext is specified, 
by which the source image exceeds the point spread func- 
tion. The parameters of ext and extl reflect that sources 
are point sources or extent sources. For example, stars or 
quasars are point sources; galaxies or galaxy clusters are 
extent sources. Therefore ext and extl are useful for clas- 
sification of objects. 

The USNO-A2.0 (Monet et al. 1998) is a catalog of 
526,280,881 stars over the full sky, compiled in the U.S. 
Naval Observatory, which contains stars down to about 20 
mag over the whole sky. Its astrometric precision is non- 
uniform, depending on position on Schmidt plates, typi- 
cally better than 1". USNO-A2.0 presents right ascension 
and declination (J2000, epoch of the mean of the blue and 
red plate) and the blue and red magnitude for each star. 

The infrared data is the first large incremental data 
release from the Two Micron All Sky Survey (2MASS). 
This release covers 2,483 square degree of northern sky 
observed from the 2MASS facility at Mt. Hopkins, AZ. 
The catalogue contains 20.2 million point and 73,980 ex- 
tended sources, and includes three bands J (1.25 /um), H 
(1.65 /um), and Kg (2.17 /im) magnitudes. 

For supervised methods, the input sample must be 
tagged with known classes. So the catalogues of known 
classes of astronomical objects need to be adopted. We 
choose known AGNs from the catalog of AGN (Veron- 
Cetty & Veron, 2000), which contains 13214 quasars, 462 
BL Lac objects and 4428 active galaxies (of which 1711 
are Seyfert 1). Stars include all spectral classes of stars, 
dwarfs and variable stars, which are adopted from SIM- 
BAD database. Normal galaxies are from Third Reference 
Catalogue of Bright Galaxies (RC3; de Vaucouleurs et al. 
1991). 

Studying the clustering properties of astronomical ob- 
jects in a multidimensional parameter space needs cata- 
logue cross-correlation to get multi-wavelength parame- 
ters available for all sources. Firstly, within a search ra- 
dius of 3 times the RBSC and RFSC positional error, we 
positionally cross- identified the catalogue of USNO-A2.0 
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with the RBSC and RFSC X-ray sources, and then cross- 
matched the data from X-ray and optical bands with in- 
fared sources in 2MASS first released database within 10 
arcscc radius. Secondly, wc similarly cross-identified the 
data from three bands with the catalogues of AGNs, stars 
and normal galaxies within 5 arcsec radius. Only consid- 
ering the unique entries, the total sample contains 1656 
(29.9%) AGNs, 3718 (67.0%) stars, 173 (3.1%) normal 
galaxies. 

In the whole process, the obtained data of AGNs, stars 
and galaxies with catalogue counterparts are divided into 
four subclasses, (i) unique entries, (ii) multiple entries, 
(iii) the same entries, (iv) no entries. In detail, unique en- 
tries refer to the objects which have only one catalogue 
entry in the various catalogues, or which have a unique 
identification in private catalogues. Multiple entries refer 
to the objects that have more than one catalogue entries 
in various catalogues. The same entries point to the two 
or three kinds of objects which have the same catalogue 
counterparts. No entries show that the objects may not 
be matched from one or more catalogues, by the reason 
of the incompleteness of catalogues. In addition, we point 
out the sample here is obtained by multi-wavelength cross- 
identification. For positional error, some sources unavoid- 
ably match the unrelated or fake sources. In order to keep 
sources as true as possibly, we only consider the unique 
entries, cross out the multiple entries, the same entries 
and no entries. Certainly, knowing which are true sources, 
we need to compute the probability to assess the validity 
of identifications of the counterparts from three bands, 
just like what Mattox et al. 1997, Rutledge et al. 2000 do 
with cross-association. Owing to the restrictive aim of this 
work, we don't investigate this respect in detail. 

In the paper, the plausibility is based on the opti- 
cal classification. X-ray characteristics like hardness ra- 
tios and extent parameter, and the infrared classification 
(Stocke et al. 1991; Motch et al. 1998; Pictsch et al. 1998; 
He et al. 2001). According to the results of the Einstein 
Medium Sensitivity Survey (EMSS; Stocke et al. 1991), 
X-ray-to-optical flux ratio. Fx / Fopi, was found to be very 
different for different classes of X-ray emitters. Motch et 
al. (1998) stated that, for source classification, the most 
interesting parameters are flux ratios in various energy 
bands, including the conventional X-ray hardness ratios, 
Fx/Fopt ratios as well as optical colors. They also pre- 
sented that, although stars and AGNs have similar X- 
ray colors, their mean X-ray to optical ratios are obvi- 
ously quite different and they are well separated in the 
HRl/2 vs. Fx/Fopt diagram. Cataclysmic variables ex- 
hibit a large range of X-ray colors and Fx/Fopt ratios 
and can be somewhat confused with both AGNs and the 
most active part of the stellar population. However, the 
addition oiaB — VorU — B optical index would allow 
to further distinguish between these overlapping popula- 
tion. He et al. (2001) stated that galactic stars usually 
have bright optical magnitudes and weak X-ray emission. 



galaxies with fainter optical magnitudes and median X- 
ray emission, and AGNs with the faintest magnitudes and 
strongest X-ray emission. In their Figure 1. of Fx/Fopt 
vs. TOy, AGNs and non-AGNs occupy different zones. 
Pietsch et al. (1998) also used a conservative extent cri- 
terion (extentLikelihood > 10" and extent > 30") as an 
indicator that the X-ray emission does not originate from a 
nuclear source. Since the corresponding parameter spaces 
overlap significantly for different classes of objects, an un- 
ambiguous identification based on one band data alone 
is not possible. In order to classify sources, we consider 
the data from optical, X-ray and infrared bands. The cho- 
sen parameters from different bands are B — R (optical 
index), B + 2Mog{CR) (optical-X-ray index), CR, HRl 
(X-ray index), HR2 (X-ray index), ext, extl, J — H (in- 
frared index), H — Kg (infrared index), J -I- 2.5log{CR) 
(infrared-X-ray index). Motch et al. (1998) showed that 
the X-ray to optical flux ratio can be approximate to 
log{Fx / Fopt) = log{countrate) + V/2.5 — 5.63, assuming 
an average energy conversion factor of 1 PSPC cts s~^ for 
a 10~^^ erg cm~^ s~^ flux in the range of 0.1 to 2.4 keV. 
So B H- 2.5log{CR) can be viewed as an X-ray-to-optical 
flux ratio, similarly, J+2.5log{CR) is an X-ray-to- infrared 
flux ratio. 

The mean values of parameters for the sample arc 
given in Table 1. Table 1 indicates that some mean val- 
ues of parameters have rather big scatter. The B — R 
value of normal galaxies is obviously larger than those 
of AGNs and stars; the CR value of AGNs is higher than 
those of stars and normal galaxies. For the mean values 
of HR2, which subdivides the hard range, there are only 
marginal differences between the individual classes of ob- 
jects. This applies to the total sample. There is a trend 
that galaxies seem to have somewhat higher <HR2> val- 
ues than AGNs and stars. AGNs and stars have on the 
average the lower HRl, i.e., they have the softer spectral 
energy distribution (SED). A significantly harder SED is 
found for normal galaxies with <HR1>= -1-0.65. This is 
indeed what is expected for this class of objects which ex- 
hibit a rather hard intrinsic spectrum caused by thermal 
bremsstrahlung from a hot (10*" — lO^K) plasma(cf. e.g. 
Bohringer 1996). The mean values of ext and extl of nor- 
mal galaxies is apparently larger than AGNs and stars. 
Furthermore, those of AGNs are larger than stars. As Ta- 
ble 1 shows, galaxies are not only 0.76 mag in J — H, but 
they also have H — K values, 0.37 mag, redder than stars. 
Likewise, AGNs are redder than stars, too. We also find 
that the mean <B-|-2.51og(CR)> and <J-t-2.51og(CR)> 
values of AGNs are much higher than those of stars and 
galaxies. This can be explained by the fact that AGNs are 
strong X-ray emitters. 

In order to see the difference of astronomical objects, 
we plot the statistical histograms of objects similar to the 
method used in Voges et al. (1999). In Fig.l we present 
the distributions of ten parameters of AGNs and S&G. 
Here S&G is short for stars and normal galaxies. The hor- 



6 Zhang Yanxia, Zhao Yongheng: Automated Clustering Algorithms for Classification of Astronomical Objects 

Table 1. The mean values of parameters for the samples 



NO. 


parameters 


AGNs 


stars 


normal galaxies 


1 


B-R 


0.41±0.78 


-1.53±4.19 


1.42±1.49 


2 


B+2.51og(CR) 


13.66±1.83 


4.18±5.33 


7.95±2.40 


3 


CR 


0.20±0.43 


0.12±0.42 


0.08±0.13 


4 


HRl 


0.09±0.53 


0.09±0.53 


0.65±0.37 


5 


HR2 


0.14±0.41 


-0.02±0.54 


0.22±0.48 


6 


ext 


6.28±9.52 


4.21±9.72 


16.11±32.12 


7 


extl 


1.88±6.62 


1.05±6.74 


7.81±31.15 


8 


J-H 


0.73±0.23 


0.24±1.77 


0.76±0.17 


9 


H-K, 


0.76±0.27 


0.09±0.11 


0.37±0.19 


10 


J+2.51og(CR) 


12.80±1.27 


4.33±1.8 


9.75±1.54 



izontal axes are labeled by all kinds of parameters and 
the vertical axes arc labeled by the number of sources. 
From the distributions of B - R, B + 2MogCR, J-H, 
H -K, J + 2.5logCR, it is obvious that AGNs are differ- 
ent from stars and galaxies. While for the distributions 
of CR, HRl, HR2, ext, extl, AGNs overlap seriously 
with stars and galaxies. In other words, B + 2.5log{CR) 
and J + 2.5log{CR) are the most important attributes 
to be used for classification. B — R, J — H and H — K 
are more important. The others contribute little. To de- 
termine the best combination of parameters for AGNs, 
stars and galaxies discrimination, we have 2-dimensional, 
5-dimensional and ten-dimensional spaces to probe. Here- 
after dimensional is short for D. The goal of the paper is 
to verify the discriminant of learning vector quantization 
(LVQ), support vector machines (SVM) and single- layer 
perceptron (SLP). In the following section, we explore this 
respect. 

4. Results and Discussion 

4.1. Results 

Since B + 2.5log{CR) and J + 2.5log{CR) may be used 

as important features, we select B + 2.5log{CR) ~ 11.8 
and J + 2.blog(CR) = 10.5 as classification criterion 
to discriminate stars and galaxies from AGNs. If B -|- 
2Mog{CR) > 11.8, the objects belong to AGNs, other- 
wise, to stars and galaxies. Similarly, if J + 2.5log{CR) > 
10.5, the objects belong to AGNs, otherwise, to stars 
and galaxies. The situation is divided into three: only 
considering B + 2Mog{CR) = 11.8, only considering 
J -|- 2.5log{CR) = 10.5 and considering both criterions. 
The results of classification are presented in Table 2. The 
whole accuracy for each situation is 94.0%, 96.5% and 
94.9%, respectively. Evidently, the results are comparable 
for three situations. 

In order to understand which parameter combination 
is best, we explore the bidimensional space composed of 
B+2.5log{CR) and J+2.5log{CR), 5D space composed of 
B + 2.5log{CR), B-R, J-H, H-K and J + 2.blog{C R), 



and lOD space composed of the ten parameters. Ran- 
domly dividing the sample into two parts, one as the 
training set and another as the test set, we use different 
methods to train the training set and get different classi- 
fiers. Then with the test set, we check how the classifiers 
are when applied for classification. If good, the classifiers 
can be used for predicting the unknown sources. Firstly, 
we apply learning vector quantization (LVQ) to classify 
AGNs from stars and normal galaxies. The results are 
given in Table 3. In 2D, 5D and lOD spaces, the total 
accuracy is 97.66%, 97.69% and 97.80%, respectively. Sec- 
ondly, we employ support vector machines (SVM) in dif- 
ferent spaces. The computed results are shown in Table 4. 
As Table 4 shows, the total accuracy amounts to 75.52%, 
98.09% and 98.31%, respectively. Comparing the results 
with those by single-layer perceptron (SLP), we give the 
result in Table 5. We train a perceptron with two input 
neurons, one output neuron and no hidden neurons for 
1000 epochs. In 2D, 5D, lOD spaces, the total accuracy 
adds up to 97.69%, 98.09%, 98.05%, respectively 

From the results by LVQ, SVM, SLP, it is obvious 
that the performances of them are comparable. In low di- 
mensional spaces, LVQ and SLP is better than SVM, but 
in high dimensional spaces, SVM shows its superiority. 
Just comparing LVQ and SLP, SLP is better. Considering 
enough attributes for classification, the automated meth- 
ods are superior to the simple physical cutoff. Moreover, 
the high accuracy suggests that the useful features for clas- 
sification can be extracted by the method of histogram, 
i.e. the histogram method may be used as the applicable 
feature selection technique. 

4-2. Discussion 

Table 2 shows that the efficiency of Classification is rather 

high, more than 90% when only considering the important 
features. Apparently, it is simple and applicable to choose 
a few good features for classification. But compared to 
the results by the automated algorithms, such a method 
is a little inefficient. After all, the method is limited by 
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Table 2. Result of Classification for three situations 





1 




2 




3 




classified J.known—» 


AGNs 


S&G 


AGNs 


S&G 


AGNs 


S&G 


AGNs 


1475 


151 


1574 


112 


1598 


226 


S&G 


181 


3740 


82 


3779 


58 


3665 


accuracy 


89.0% 


96.0% 


95.0% 


97.0% 


96.5% 


94.0% 


Total accuracy 


94.0% 




96.5% 




94.9% 





Table 3. Result of Classification by LVQ 





2D space 




5D space 




lOD space 




classifiedj. known—* 


AGNs 


S&G 


AGNs 


S&G 


AGNs 


S&G 


AGNs 


824 


61 


828 


64 


828 


61 


S&G 


4 


1885 





1882 





1885 


accuracy 


99.52%, 


96.87%. 


100.0%, 


96.71%. 


100.0% 


96.87% 



Total accuracy 97.G(.)% 97.09% 97.80% 



Table 4. Result of Classification by SVM 





2D space 




5D space 




lOD space 




classifiedj. known—* 


AGNs 


S&G 


AGNs 


S&G 


AGNs 


S&G 


AGNs 


150 


1 


782 


7 


818 


37 


S&G 


678 


1945 


46 


1939 


10 


1909 


accuracy 


18.12% 


99.95% 


94.44% 


99.64% 


98.79% 


98.10% 


Total accuracy 


75.52% 




98.09% 




98.31% 





itself for it can't avoid losing information only with a few 
features. What's more, sometimes it is very difficult to 
find such good features. Only depending on other tools, 
such as principal component analysis (Folkes et al. 1996, 
Zhang et al. 2003), we can find the principal features. If 
the number of principal components is more than 3, it 
is not appliable to use simple cutoff for the difficulty of 
visualization. As a result, it is better to apply automatic 
approaches under such situations. 

For LVQ and SLP, as shown by Tables 3 and 5, the 
results are rarely affected by the number of space dimen- 
sion when the space owns the important features. But for 
SVM, in contrast, the result of Table 4 is closely connected 
with the number of space dimension even including the 
important features. Moreover, the more parameters con- 
sidered, the higher the accuracy is. For low dimensional 
spaces, LVQ and SLP are better. While for high dimen- 
sional spaces, SVM shows its superiority. Moreover, the 
statistics listed in Tables 3-5 give a view of how well the 
algorithms did in classifying AGN and non-AGN objects. 
These statistics tell us how effective a given method is at 
correctly identifying a true AGN as an AGN or a true 
non-AGN as a non-AGN. In other words, how often does 



the method misidentify objects? If the number of AGN 

objects identified as non-AGNs were zero, the classified 
accuracy of AGNs is 100%. Conversely, if the number of 
non-AGNs identified as AGNs were zero, the classified ac- 
curacy of stars and normal galaxies is 100%. The generally 
lower values of the classified accuracy of AGNs compared 
to those of stars and normal galaxies may be a result of 
the smaller sample size for AGNs (1656 vs. 3891). This 
suggests that it would be useful to run these tests again 
with a larger sample base for the methods examined here. 
Given our results for the methods presented here, we are 
encouraged that distinguishing between a number differ- 
ent types of objects should be possible. For such a project, 
a larger number of samples of each type of object would 
be necessary to have an adequate ability to distinguish 
between the classes. Comparing the computed results, we 
conclude that LVQ, SVM and SLP are effective methods 
to classify sources with multi-wavelength data. With the 
data from three bands, we can classify AGNs from stars 
and normal galaxies effectively by LVQ, SLP or SVM. This 
also indicates that the chosen parameters are such good 
feature vectors to separate AGNs from stars and normal 
galaxies. We believe the performance will increase if the 
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Table 5. Result of Classification by SLP 





2D space 




5D space 




lOD space 




classified J. known—* 


AGNs 


S&G 


AGNs 


S&G 


AGNs 


S&G 


AGNs 


826 


62 


826 


51 


825 


51 


S&G 


2 


1884 


2 


1895 


3 


1895 


accuracy 


99.76% 


96.81% 


99.76% 


97.38% 


99.64% 


97.38% 


Total accuracy 


97.69% 




98.09% 




98.05% 





data are complete or the quality and quantity of data im- 
proves. Moreover, these methods can be used to prese- 
lect AGNs from large numbers of sources in large surveys 
avoiding wasting time and energy, when studying AGNs 
or cosmology. The three supervised learning methods we 
investigated here gave comparable results in a number of 
situations. Generally, the more features considered, the 
better results SVM gave; however, the results of LVQ and 
SLP were considerable with different number of attributes. 
Also, the different methods, while giving different quality 
results in a number of cases, were comparable for most 
of the samples we examined. However, our results sug- 
gest that the parameters we choose did not adequately 
pick out characteristics of the objects in all cases. Other 
parameters added from more bands that effectively sum- 
marize the features of sources, such as from radio band, 
appear to do better (Krautter et al. 1999). Thus we can 
improve the classified accuracy of AGNs or stars and nor- 
mal galaxies, even classify different types of AGNs. More- 
over, these methods can be used for other types of data, 
such as spectral data and photometric data. We believe 
that it would be beneficial to have more extensive com- 
parisons between different methods. Only then can we take 
some of the magic out of determining what parameters to 
choose and know which method to use better in different 
cases. 



The performances of LVQ and SLP are different from 

that of SVM, which arises from different methods based 
on different theories. SVM embodies the Structural Risk 
Minimization (SRM) principle, which is superior to Em- 
pirical Risk Minimization (ERM) principle that conven- 
tional neural networks employ. Most neural networks in- 
cluding LVQ and SLP are designed to find a separating 
hyperplane. This is not necessarily optimal. In fact many 
neural networks start with a random line and move it, 
until all training points are on the right side of the line. 
This inevitably leaves training points very close to the line 
in a non-optimal way. However, in SVM, a large margin 
classifier, i.e. a line approaching the optimal is sought. As 
a result, SVM shows better performance than LVQ and 
SLP in the high dimensional space. 



5. Conclusion 

Sources classification depends on the quality and amount 

of real-time data and on the algorithm used to extract 
generalized mappings. Availability of the high-resolution 
multi- wavelength data constantly increases. The best pos- 
sible use of this observational information requires effi- 
cient processing and generalization of high-dimensional 
input data. Moreover, good feature selection techniques, 
as well as good data mining methods, arc in great de- 
mand. A very promising algorithm that combines the 
power of the best nonlinear techniques and tolerance to 
very high-dimensional data is support vector machines 
(SVM). In this work we have used histogram as the feature 
selection technique and applied LVQ, SLP and SVM to 
multi-wavclcngth astronomy to classify AGNs from stars 
and normal galaxies. We conclude that the features se- 
lected by histogram are applicable and the performance of 
SVM models can be comparable to or be superior to that 
of the NN-based models in the high dimensional space. 
The advantages of the SVM-based techniques are ex- 
pected to be miich more pronounced in future large multi- 
wavelength survey, which will incorporate many types of 
high-dimensional, multi-wavelength input data once real- 
time availability of this information becomes technolog- 
ically feasible. All these methods can be used for astro- 
nomical object classification, data mining and preselecting 
AGN candidates for large survey, such as the Large Sky 
Area Multi-Object Fiber Spectroscopic Telescope (LAM- 
OST). Various data, incuding morphology, photometry, 
spectral data and so on, can be applied to train the meth- 
ods and obtain classifiers to classify astronomical objects 
or preselect intresting objects. When lacking training sets, 
we may explore some unsupervised methods or outlier 
finding algorithms to find unusual, rare, or even new types 
of objects and phenomena. In addition, with the develop- 
ment of the Virtual Observatory, these methods will be 
part of the toolkits of the International Virtual Observa- 
tory 
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Fig. 1. Ten histograms summarizing some results of the 
analysis of RBSC and RFSC sources for 1656 AGNs (solid 
line) and 3891 S&G (dotted line). 



